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I.  INTRODUCTION 


BACKGROUND 

Current  Marine  Corps  recruiting  policies  show  a strong  preference  for  high  school 
graduates.  These  policies  are  the  result  of  analytic  evidence  that  the  educational  level 
of  a recruit  is  highly  correlated  with  the  quality  of  his  later  military  service.  More 
specifically,  high  school  graduates  liave  been  shown  to  have  higher  promotion  rates 
and  lower  attrition  rates.  Thus,  by  biasing  its  recruiting  to  increase  the  ratio  of  high 
school  graduates  to  nongraduates,  the  Marine  Corps  can  improve  the  average  quality 
of  service  of  its  members  and  can  reduce  personnel  turbulence  resulting  from  discharge 
and  desertion.  Effectively  implementing  such  policies  is  essential  if  the  Marine  Corps 
is  to  successfully  compete  for  recruits  in  the  environment  of  an  All -Volunteer  Force. 
Evidence  suggesting  that  the  number  of  potential  recruits  is  declining  as  a result  of 
fluctuations  in  U.S.  birthrates  makes  effective  implementation  of  these  policies  even 
more  important. 

Literature  is  available  that  provides  lists  of  variables  to  be  used  to  predict  quality 
of  performance  or  success  of  service,  when  applied  to  the  total  population  of  potential 
recruits.  Reference  1,  for  example,  presents  a method  to  predict  success  in  the  Marine 
Corps  as  a function  of  education,  age,  and  the  Army  Classification  lottery  (ACB-bl) 
test  scores  for  Classification  Inventory  (Cl)  and  the  General  Classification  Test  (GCT). 

That  model  is  predicated  on  the  generally  accepted  assumption  that  the  realities  of 
supply  and  demand  will  force  the  Marine  Corps  to  continue  to  accept  a certain  number 
of  recruits  who  are  not  high  school  graduates.  It,  therefore,  includes  educational  level 
as  a variable.  But  once  the  initial  question  of  educational  level  has  lxjen  answered,  what 
factors  should  be  used  for  subsequent  screening9  Are  they  the  same  for  graduates  and 
nongraduates9  Clearlv,  while  graduates  are  generally  preferred,  not  all  graduates 
should  be  accepted.  Neither  should  all  nongraduates  he  rejected.  If  it  is  true  that  a 
high  school  graduate  accession  rate  of  100  percent  is  impossible,  then  the  Marine  Corps 
must  have  the  tools  with  which  to  select  potential  recruits  from  the  two  population  sub- 
groups. It  is  unclear  whether  such  tools  are  currently  available.  Apparently,  the 
questions  posed  above  concerning  separate  screening  factors  for  graduates  and  non- 
graduates  have  never  been  answered. 

OBfKCTIVKS 

Given  the  extreme  pressures  of  recruiting,  it  is  important  to  maximize  the  accuracy 
and  usefulness  of  the  model  on  which  recruiting  policies  are  based.  This  analysis  examines 
the  question  of  whether  forecasts  of  quality  and  success  can  lx>  improved  by  treating 
high  school  graduates  and  nongraduates  separately.  More  specifically,  the  objectives 
arc  as  follows: 


• To  develop  a (disaggregated)  model  for  separately  predicting  the  success 
of  high  school  graduates  and  nongraduates, 

• To  determine  whether  there  is  a statistically  significant  difference  between 
that  disaggregated  model  and  the  aggregated  model  of  reference  1,  and 

• If  the  observed  differences  are  significant,  to  determine  which  model 
provides  the  better  basis  for  recruiting  policy. 

The  methodology  developed  for  comparing  the  two  models  will  be  applied  to  the 
specific  questions  iterated  above,  but  is  applicable  in  a much  broader  sense  --  i.e., 
to  forecasting  models  in  general.  One  can  normally  find  several  predictions,  or  fore- 
casts, of  future  conditions.  A precise  understanding  of  the  nature  of  the  rorecasts, 
and  the  differences  between  them,  is  therefore  of  potential  importance. 


FINDINGS 


This  analysis  indicates  that  attrition/success  estimates  based  on  an  aggregated 
model  are,  in  fact,  different  from  those  hased  on  a disaggregated  model.  The  aggregated 
model  treats  all  potential  recruits  as  members  of  a single  homogeneous  population. 

The  disaggregated  model  separates,  or  disaggregates,  the  high  school  graduates  and 
nongraduates  and  examines  them  as  separate  and  distinct  subgroups.  The  analysis 
further  demonstrates  that  although  the  observed  differences  are  statistically  signifi- 
cant at  the  95-percent  confidence  level,  they  are  relevant  only  in  a purely  theoretical 
sense.  Tn  a realistic  recruiting  application,  the  differences  in  the  predicted  attrition 
rates  of  the  two  alternative  models  are  not  operationally  important.  The  aggregated 
model  is,  therefore,  appropriate  and  sufficient  as  a basis  for  recruiting  policy.  Since 
it  is  also  the  simpler  of  the  two  models,  it  is  the  preferred  alternative. 

Although  the  precise  use  of  any  such  model  is  a matter  for  Marine  Corps  decision 
makers,  one  point  worthy  of  note  is  evident  from  this  analysis.  A regression  model, 
properly  applied  as  an  enlistment  screening  device,  can  effectively  enlarge  the  pool 
of  potential  recruits.  When  considered  in  the  context  of  increasingly  competitive  and 
difficult  recruiting,  that  point  is  significant. 
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II.  DATA  AND  METHODOLOGY 


DATA 

The  data  for  this  analysis  is  the  same  as  was  used  in  reference  1.  It  consists  of 
personal  characteristics,  test  scores,  and  2-year  attrition  statistics  collected  from  the 
records  of  approximately  46,000  regular,  male,  nonprior -service  enlistees  who  re- 
ported for  recruit  training  during  FY  1974.  These  records  were  obtained  from  the  Man- 
power Management  System  (MMS)  and  the  Recruit  Accession  Management  System  (RAMS). 

Variables 

The  personal  characteristics  chosen  for  this  analysis  are  age,  race,  and  marital 
status.  In  order  to  be  consistent  with  previous  work  in  this  area,  all  three  are  assumed 
to  be  dichotomous  variables.  Age  (upon  reporting  for  active  duty)  is  either  17-20  or  21 
and  over,  race  is  either  white  or  nonwhite,  and  marital  status  (upon  reporting)  is  either 
married  or  unmarried. 

The  test  scores  are  from  the  11  ACB-61  subtests  administered  to  each  enlistee 
upon  arrival  at  the  recruit  depot.  Even  though  the  Armed  Services  Vocational  Aptitude 
Battery  (ASVAB)  has  supplanted  the  ACB-61,  it  was  necessary  to  use  ACB-61  data  in 
this  analysis.  The  ASVAB  was  not  used  until  1975,  so  performance  data  for  recruits 
to  whom  it  was  given  is  limited.  Using  ACB-61  data  should  not  detract  from  the  re- 
sults, however.  The  objective  of  this  analysis  is  not  to  generate  specific  values  of  t he 
prediction  variable  but,  rather,  to  determine  whether  a disaggregated  model  is  required 
and,  if  so,  how  such  a model  might  be  developed.  These  questions  can  he  answered  on 
the  hasis  of  ACB-61  data  and  the  results  later  generalized  to  ASVAB  data,  if  necessary. 

A variable  for  enlistment  guarantees  was  also  included  in  the  analysis.  Since  such 
guarantees  are  used  to  induce  enlistment,  they  were  included  to  determine  what  effect, 
if  any,  they  have  on  service  after  enlistment. 

Although  success  is  the  variable  of  primary  interest,  its  major  determinants  are 
attrition  (desertion  and  premature  discharge)  and  promotion.  Since  attrition  is  also  of 
interest  in  its  own  right,  it  has  been  chosen  as  a surrogate  for  success  and  is  the  de- 
pendent variable  for  this  analysis.  Success,  then,  is  indicated  by  the  ability  to  remain 
in  service  for  at  least  2 years  following  enlistment.  This  choice  has  the  advantages  of 
simplicity  and  clarity  and  facilitates  comparisons  with  reference  1,  which  likewise  used 
attrition  as  a measure  of  success. 


Reference  2 has  taken  a preliminary  look  at  the  service  performance  of  a 2-month 
cohort  of  recruits  who  were  administered  the  ASVAB  tests.  The  results  of  that  analysis 
will  be  commented  upon  later. 


L 
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Table  1 lists  the  explanatory  variables  and  rheir  possible  values. 


TABLE  1 
VARIABLES 


Dependent  variable 

Value 

Attr i t ion 

0 

= 

Neither  desertion  nor 

early  attrition 

1 

= 

Either  one  or  both 

Independent/ explanatory  variables 

Age 

0 

= 

17-20 

1 

- 

21  or  more 

Race 

1 

= 

White 

2 

= 

Nonwhite 

Marital  status 

0 

= 

Unmarried 

1 

= 

Married 

Enlistment  guarantee 

0 

= 

None 

1 

= 

Cash 

n 

= 

Noncash 

ACB-61  test  scores 
VE  (Verbal) 

AR  (Arithmetic) 

PA  (Pattern  Analysis) 

Cl  (Classification  Inventory) 
MA  (Mechanical  Aptitude) 

ACS  (Army  Coding  Speed) 

ARC  (Army  Radio  Code) 

GIT  (General  Information) 

SM  (Shop  Information) 

AI  (Automotive  Information) 
ELI  (Electronics  Information) 


Standard  score 


ACB-61  composite 

GCT  (General  Classification 

Test)  1/3 (VE  + AR  + PA) 


METHODOLOGY 


The  method  of  analysis  is  step-wise  multiple  linear  regression  of  the  explanatory 
(independent)  variables  on  attrition  (the  dependent  variable).  This  technique  makes  it 
possible  to  construct  a linear  model  that  will  predict  attrition  (success)  as  a function 
of  specified  values  of  the  explanatory  variables.  The  step-wise  approach  also  identifies 
the  relative  importance  of  the  explanatory  variables  in  predicting  attrition.  This  approach 
allows  a user  of  the  model  to  greatly  simplify  its  form  by  eliminating  (ignoring)  the  vari- 
ables that  contribute  little  to  its  predictive  ability. 

The  regression  is  applied  to  disaggregated  subsets  of  the  data  base:  one  containing 
high  school  graduates  only,  the  other  nongraduates  only.  ^ A set  of  equations  to  pre- 
dict success  in  the  Marine  Corps  is  presented.  This  model,  which  differentiates  be- 
tween high  school  graduates  and  nongraduates,  is  compared  with  the  aggregated  model 
of  reference  1.  Observations  are  made  concerning  the  applicability  of  the  models  to 
potential  Marine  Corps  uses. 


Individuals  with  a High  School  General  Equivalency  Diploma  (GED)  were  included  in 
the  uongraduate  subgroup  because  evidence  suggests  that  they  tend  to  behave  more 
like  nongraduates  titan  graduates  in  their  service  performance  (reference  1). 


III.  RESULTS 


For  this  analysis,  the  recession  technique  was  manipulated  to  produce  the 
following  information: 

• Means  and  standard  deviations  for  each  variable; 

• Coefficients  of  correlation  between  each  variable  and  every  other  variable; 
and 

• Regression  statistics  for  five  different  combinations  of  variables. 


The  results  of  these  manipulations  are  shown  in  appendixes  A and  B. 

Table  2 shows  the  means  and  standard  deviations  of  each  variable  for  both  high 
school  graduates  and  nongraduates  (95-percent  confidence  intervals  are  contained  in 
appendix  A --  as  are  the  coefficients  of  correlation  between  every  possible  pair  of  vari- 
ables). Xote  that,  compared  to  nongraduates,  high  school  graduates  are  generally  older, 
less  likely  to  fall  prev  to  attrition,  and  more  likely  to  be  white  and  single.  They  also 
have  consistently  higher  scores  on  the  ACB-61  subtests. 

The  primary  output  of  a regression  analysis  is  an  equation  that  can  be  used  to 
predict  values  of  a dependent  variable  (such  as  attrition),  based  on  specified  or  observed 
values  of  one  or  more  independent  variable  (such  as  GCT  score).  The  coefficients  of  the 
independent  variables  in  the  regression  equation  indicate  their  relative  importance  in 
predicting  values  of  the  dependent  variable.  Their  sign  indicates  the  direction  of  the 
effect.  A negative  coefficient,  for  example,  predicts  decreasing  values  of  the  dependent 
variable  for  increasing  values  of  the  independent  variable. 

2 

The  quality  of  the  regression  equation  is  indicated  bv  the  R“  and  F statistics.  The 

9 

R statistic  measures  the  proportion  of  total  variation  in  the  dependent  variable  that  is 
explained  by  the  independent  variables.  The  partial  F statistic  measures  the  significance 
of  a given  independent  variable  as  a predictor  for  the  dependent  variable. 


The  step-wise  procedure  used  in  this  analysis  selects  variables  in  decreasing  order 
of  their  contribution  to  minimizing  residual  variation.  The  first  variable  selected  as  a 
predictor  explains  more  of  the  total  variation  in  the  dependent  variable  than  any  other 
single  variable.  The  next  variable  contributes  more  to  explaining  the  residual  (remain- 
ing) variation  than  any  other  remaining  variable.  This  step-wise  procedure  continues 
until  the  independent  variables  are  exhausted.  Normally,  some  residual  variation  re- 
mains at  the  completion  of  the  regression,  since  a perfect  prediction  model  is  seldom 
available.  Therefore,  the  value  of  Rz  will  always  be  less  than  1.  It  will,  in  fact,  be 
between  0 and  1 in  anv  application  of  the  regression  technique  to  actual  (uncontrived) 
data. 
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TABLE  2 


MEANS  AND  STANDARD  DEVIATIONS 

Graduates  Nongraduates 


Variables 

Mean 

Std. Dev. 

Mean 

Std. Dev. 

Attrition 

0.20 

0.40 

0.45 

0.50 

Age 

0.12 

0.33 

0.08 

0.27 

Race 

1 .20 

0.40 

1 .24 

0.43 

Marital  status 

1 .06 

0.25 

1 . 08 

0.27 

VE 

108.24 

22.22 

92.62 

20.98 

AR 

103.43 

22.26 

88.63 

19-96 

PA 

110.96 

21.48 

100.38 

22.00 

C 1 

102.49 

26.45 

89.99 

27.23 

MA 

104.34 

20.  16 

94.56 

18.2  1 

ACS 

103.00 

19.76 

92.47 

19-49 

ARC 

90.06 

26.50 

77-89 

23.80 

GIT 

99.41 

19-84 

87.43 

18.52 

SM 

101.10 

18.90 

91.67 

17.97 

A 1 

102.46 

19-53 

95-75 

18.46 

ELI 

97.47 

23-92 

88.0  7 

22.45 

GCT 

107.54 

19-56 

93-88 

18.05 

DEVELOPING  THE  DISAGGREGATED  MODEL 


!• 


To  identify  the  variables  that  most  accurately  and  conveniently  predict  attrition, 
the  regression  technique  was  applied  to  five  combinations  of  variables.  Since  the  data 
base  has  been  disaggregated  into  two  subgroups  --  graduates  and  nongraduates  --  ten 
separate  regression  relationships  are  produced.  The  results,  shown  in  appendix  B, 
are  regression  numbers  HS-1  through  HS-5  for  high  school  graduates  and  NG-1  through 
NG-5  for  nongraduates. 

The  complete  set  of  explanatory  variables  was  examined  first,  producing  regression 
numbers  HS-1  and  NG-1.  Since,  ultimately,  it  is  desirable  to  have  a regression  model 
that  is  not  only  as  accurate  and  powerful  as  possible,  but  also  as  simple  and  useful  as 
possible,  the  number  of  variables  was  limited  on  the  basis  of  their  relative  contribution 
to  explaining  the  variation  in  the  observed  attrition,  as  demonstrated  in  HS-1  and  NG-1 . 
An  examination  of  the  "R^"  and  "CUMULATIVE  r2"  columns  of  table  3 (or  appendix  B) 
reveals  that  in  terms  of  their  relative  contributions  to  cumulative  R^,  two  or  three  vari- 
ables would  probably  be  sufficient  for  the  model.  Beyond  that  point,  the  marginal  con- 
tribution to  explaining  residual  variation  is  small.  The  three  best  predictor  variables 
for  the  two  population  subgroups  are: 

Graduates  Nongraduates 


PA 

Cl 

AGE 

PA 

Cl 

GUAR 

GUAR  was  eliminated  from  consideration  for  the  model  because  it  is  not  considered  a 
valid  enlistment  screening  device.  Enlistment  guarantee  is  used  during  the  recruiting 
process  as  an  inducement  to  deserving  and  reluctant  prospects.  It  does  not,  however, 
affect  the  eligibility  of  the  applicant  and  is  therefore  not  legitimately  involved  in  the 
initial  screening  process.  Additionally,  GOT  was  substituted  for  PA,  since  it  includes 
PA  and  is  already  widely  known  and  used  as  an  indicator  of  quality.  The  loss  in  cumu- 
lative R^  that  results  from  the  substitution  is  very  small  (viz.,  0.00027  for  graduates 
and  0.00031  for  nongraduates).  The  loss  is  more  than  justified  by  using  a variable  that 
has  wide  application  in  both  recruiting  and  school  assignment.  Reference  4,  for  example, 
demonstrates  that  GCT  is  an  excellent  predictor  of  school  performance.  GCT  has  the 
additional  convenience  of  being  highly  correlated  with  the  Mental  Group  (MG)  composite 
of  the  ASVAB,  which  is  currently  used. 

Thus,  the  second  pair  of  regressions  (HS-2  and  NG-2)  involved  GCT,  Cl,  and  AGE. 
The  third,  fourth,  and  fifth  regressions  merely  tested  different  combinations  of  the 
variables,  such  as  test  scores  only  and  personal  characteristics  onlv.  The  results  are 
shown  in  tables  B-l  and  B-2  of  appendix  B. 


L 
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TABLE  3 


No. 

HS-T 


HS-2 


HS-3 


HS-4 


HS-5 


RESULTS  OF  VARIABLE  GROUPINGS 


Graduates 

Var  AR2 

Cum  R 

No. 

Nongraduates 

Var  AR2 

Cum 

PA 

0.03H 

0.0311 

NG-I 

C 1 

0.0300 

0 0300 

AGE 

0.0104 

0.0415 

PA 

0.0115 

0.0415 

Cl 

0.0109 

0.0424 

GUAR 

0.0024 

0.0439 

GUAR 

0.0039 

0.0563 

AGE 

0.0012 

0.0451 

GIT 

0.0020 

0.0583 

GIT 

0.0011 

0.0462 

VE 

0.0005 

0.0588 

RACE 

0.0006 

0.0468 

AR 

0.0009 

0.0597 

AR 

0.0005 

0.0473 

RACE 

0.0004 

0.0603 

VE 

0.0003 

0.0476 

ARC 

0.0004 

0.0607 

MAR  IT 

0.0003 

0.0479 

MAR  IT 

0.0003 

0.0610 

MA 

0.0003 

0.0482 

ELI 

0.0002 

0.0612 

SM 

0.0001 

0.0483 

SM 

0.0000 

0.0612 

ARC 

0.0001 

0.0484 

ACS 

0.0000 

0.0612 

ACS 

0.0001 

0.0485 

A 1 

0.0000 

0.0612 

ELI 

0.0001 

0.0486 

MA 

0.0000 

0.0612 

A 1 

0.0000 

0.0486 

GCT 

0.0336 

0.0336 

NG-2 

GCT 

0.0339 

0.0339 

AGE 

0.0116 

0.0452 

C 1 

0.0072 

0.0411 

Cl 

0.0069 

0.0521 

AGE 

0.0025 

0.0436 

PA 

0.03H 

0.0311 

NG-3 

Cl 

0.0300 

0.0300 

Cl 

0.0100 

0.0411 

PA 

0.0115 

0.0415 

GIT 

0.0022 

0.0433 

AR 

0.0015 

0 . 043  0 

ARC 

0.0005 

0.0438 

GIT 

0.0007 

0.0437 

VE 

0.0006 

0.0444 

MA 

0.0003 

0.0440 

AR 

0.0008 

0.0452 

VE 

0.0002 

0.0442 

ELI 

0.0002 

0.0454 

ARC 

0.0001 

0.0443 

SM 

0.0001 

0.0455 

ACS 

0.0001 

0 . 0444 

ACS 

0.0000 

0.0455 

SM 

0.0001 

0.0445 

A 1 

0.0000 

0.0455 

ELI 

0.0000 

0.0445 

MA 

0.0000 

0.0455 

A 1 

0.0000 

0.0445 

GCT 

0.0336 

0.0336 

NG-4 

GCT 

0.0339 

0.0339 

Cl 

0.0064 

0.0400 

C 1 

0.0073 

0.0412 

GUAR 

0.0043 

0 . 0443 

GUAR 

0.0017 

0.0429 

AGE 

0.0128 

0.0128 

NG-5 

RACE 

0.0031 

0.0031 

RACE 

0.0070 

0.0198 

AGE 

0.0016 

0.0047 

MAR  IT 

0.0002 

0.0200 

MAR  IT 

0.0004 

O 

O 

O 

\J1 

-Q- 


r 


The  three  variables  GCT,  Cl,  and  AGE  are  seen  to  explain  more  of  the  variation 
in  observed  attrition  than  anv  other  combination  of  variables  tested,  except  the  complete 
set.  The  loss  in  explained  variation  (R“),  resulting  from  the  reduction  from  15  to  3 
variables,  is  very  small  (on  the  order  of  1/2  to  1 percent)  and  is  more  than  offset  by 
the  gain  in  simplicity  and  potential  usefulness.  The  F-ratios  for  the  three  variables  are 
large  enough  to  indicate  with  99 -percent  certainty  that  they  appear  in  the  regression 
equation  because  of  true  statistical  association  with  attrition  and  not  by  chance.  Thus, 
the  attrition  model  selected  is  a linear  combination  of  the  GCT,  Cl,  and  AGE  variables. 

Before  proceeding,  it  should  be  noted  that  test  scores  are  clearly  superior  to  per- 
sonal characteristics  as  predictors  of  attrition.  Therefore,  testing  of  enlistment  candi- 
dates is  of  paramount  importance  if  successful  screening  --  and  effective  recruiting  -- 
is  to  be  accomplished.  It  remains  true,  however,  that  combinations  of  test  scores  and 
personal  characteristics  (viz.,  GCT,  Cl,  and  AGE)  provide  the  best  available  basis  for 
an  enlistment  screening  model. 

TWO  ATTRITION  MODELS 


We  now  have  two  consistent,  but  slightly  different,  models  for  predicting  attrition: 
the  aggregated  model  of  reference  1 and  the  disaggregated  model  developed  herein. 

These  two  models  may  be  represented  as: 

• AGGREGATED  MODEL 

AAG  = 0. 8694  + 0. 1090  (AGE)  - 0.0029  (GCT)  - 0.001 7 (Cl)  - 0. 1870  (ED) 

• DISAGGREGATED  MODEL 
= 0.6150  + 0.1341  (AGE)  - 0.0025  (GCT)  - 0.0015  (CO 

= 0.9333  + 0.0714  (AGE)  - 0.0034  (GCT)  - 0.0019  (CO  , 

= probability  of  attrition, 

= the  aggregated  model, 

= high  school  graduate, 

= nongraduate,  and 

= a dichotomous  variable  for  educational  level  (i . e . , ED  = 1 
for  a high  school  graduate  and  ED  = 0 for  a nongraduate) . 


and 


where 


HS 

ang 

A 

AG 

HS 

NG 

ED 


Tables  4 through  7 show  the  attrition  rates  predicted  by  the  disaggregated  model. 
Similar  tables  for  the  aggregated  model  are  contained  in  reference  1 . 
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tabu;  4 

PREDICTED  ATTRITION  RATES: 
HIGH  SCHOOL  GRADIJATHS , AGES  1 


120 

110 

90 

60 

0.135 

0.160 

0.185 

0.210 

0.235 

0.260 

0.285 

0.150 

0.175 

0.200 

0.225 

0.250 

0.27  5 

0.300 

0.165 

0.190 

0.215 

0.240 

0 . 265 

0.290 

0.315 

0.  180 

0.205 

0.2  30 

0.255 

0.280 

0.305 

0.330 

0.195 

0.220 

0.245 

0.270 

0.295 

0.320 

0.34  5 

0.210 

0.235 

0.260 

0.285 

0.310 

0.335 

0.360 

0.225 

0.250 

0.275 

0.300 

0.325 

0.350 

0.375 

Attrition  = 0.6150  - 0.002S(GCT)  - 0.0015(01) 

TABLE  5 

PREDICTED  ATTRITION  RATES: 

HIGH  SCHOOL  GRADUATES,  AGES  21  AND  OVER 


90 

80 

70 

60 

0.269 

0.2  94 

0.319 

0.344 

0.369 

0.394 

0.419 

0.284 

0.309 

0.3  34 

0.359 

0.384 

0.409 

0.4  34 

0.  299 

0.324 

0.349 

0.37  4 

0.399 

0.4  24 

0.449 

0.314 

0.339 

0.36  4 

0.389 

0.414 

0.439 

0.464 

0.329 

0.354 

0.379 

0.4  04 

0.4  29 

0.4  54 

0.479 

0.344 

0.369 

0.394 

0.419 

0.444 

0.4  69 

0.4  94 

0.359 

0.384 

0.409 

0.434 

0.4  59 

0.484 

0.509 

Attrition  = 0. 

7491  - 0, 

. 0025  (GCT) 

0. 

001 5 (Cl) 

TABLE  6 


Cl 


PREDICTED  ATTRITION  RATES; 
NONGRADUATES,  AGE  17-20 


GCT 


120 

110 

100 

90 

_80 

70 

60 

120 

0 

. 297 

0 

.331 

0 

. 365 

0 

. 399 

0. 

,433 

0. 

.478 

0. 

501 

110 

0 

. 316 

0 

. 350 

0 

. 384 

0 

.418 

0. 

.452 

0, 

.486 

0. 

520 

100 

0 

. 335 

0 

. 369 

0 

.403 

0 

.437 

0. 

,471 

0. 

. 505 

0. 

539 

90 

0 

. 354 

0 

. 388 

0 

. 422 

0 

.456 

0. 

.490 

0, 

.524 

0. 

558 

80 

1 

0 

1 

. 373 

0 

.407 

0 

.441 

0 

.475 

0. 

. 509 

0, 

. 543 

0 , 

577 

70 

1 0 

.392 

0 

.426 

0 

.460 

0 

.494 

0. 

. 528 

0. 

.562 

0. 

5 96 

60 

0 

.411 

0 

.445 

0 

.479 

0 

.513 

0. 

.547 

0. 

. 581 

0. 

615 

Attrition  = 0.9333  - 0.0034(GCT)  - 0.0019(CI) 


■ TABLE  7 

PREDICTED  ATTRITION  RATES:  NONGRADUATES, 
AGE  21  AND  OVER 

GCT 


120 

110 

100 

90 

80 

70 

60 

1 

120 

r 

0 

.369 

0 

.403 

0 

.437 

0, 

.471 

0 

. 505 

0 

. 539 

0 

. 57  3 

110 

0 

. 388 

0 

.422 

0 

.456 

7 , 

.490 

0. 

. 524 

0 

. 558 

0 

. 592 

100 

0 

.407 

0 

.441 

0 

.475 

0. 

, 509 

0, 

. 543 

0. 

. 517 

0, 

.611 

90 

0 

.426 

0 

.460 

0 

.494 

0. 

.528 

0. 

. 562 

0. 

. 5 96 

0 , 

.630 

80 

0 

.445 

0 

.479 

0 

. 513 

0. 

.547 

0, 

.581 

0. 

.615 

0 , 

.649 

70 

0 

.464 

0 

.498 

0 

. 532 

0. 

, 566 

0. 

.600 

0. 

,6  34 

0. 

.668 

60 

0 

.483 

0 

. 517 

# 

0551 

0. 

.585 

0. 

.619 

0 . 

.653 

0 . 

.687 

Attrition  = 1.0047  - 0.0034  (GCT)  - 0.0019(d) 
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COMPARING  THE  MODELS 


Having  constructed  two  models  for  generating  estimates  of  the  same  variable 
(attrition),  it  is  now  necessary  to  determine  whether  the  differences  are  statistically 
significant  at  a given  level  and,  if  so,  to  quantify  and  interpret  them.  The  first  part 
of  this  comparison  requires  an  hypothesis  test  in  which  the  null  hypothesis  is  that  there 
is  no  difference  between  the  two  models.  The  hypothesis  test  uses  a variation  of  the 
mean  square  error  test  for  exact  linear  restriction  in  regression  (reference  5).  A 
statistic  u is  distributed  as  the  noncentral  F with  m and  n degrees  of  freedom: 

SSE(B)  - SSE(b)  ^ SSE(b) 
m * n 

where  SS  i(B)  = the  calculated  error  sum  of  squares  in  the  restricted  regression, 
and  SSE(b)  = the  calculated  error  sum  of  squares  in  the  unrestricted  regression. 

The  mean  square  error  test  is  essentially  a test  of  the  significance  of  the  (mean 
square)  error  introduced  by  constraining  the  regression  to  linearity  in  a specified 
number  of  variables.  It  is  frequently  used  to  test  an  hypothesis  of  linearity.  The 
variation  used  in  this  hypothesis  test  measures  the  significance  of  the  differences  be- 
tween the  errors  introduced  by  the  two  sets  of  constraint  variables.  It  may  be  expressed 
as: 

^ SSE(A)  - SSE(D)  . SSE(D) 

F = m ~ ’ 

where  SSE(A)  = Error  sum  of  squares  in  the  regression  of  the  aggregated  model, 

SSE(D)  = Error  sum  of  squares  in  the  regression  of  the  disaggregated  model 
(equal  to  the  sum  of  the  error  sums  of  squares  of  the  high  school 
graduate  and  nongraduate  portions  of  the  model), 

m = number  of  variables  in  the  regression, 

and  n = number  of  observations  (sample  size). 

This  equation  yields  an  F statistic  with  a value  of  F = 11 .9.  Since  this  is  greater 
than  F ^,.(3,  ®)  = 2.60,  we  reject  the  null  hypothesis  and  conclude  that  the  observed 

differences  between  the  two  models  are  significant  at  the  95-percent  confidence  level. 

To  quantify  the  differences  between  the  models,  an  approach  patterned  after  that 
in  reference  6 was  adopted.  The  attrition  models  are  assumed  to  be  enlistment  screening 
devices  that  determine  whether  a potential  enlistee  is  accepted.  For  each  potential 
enlistee,  the  models  produce  an  estimate  of  prohable  attrition,  based  on  his  educational 
status,  age,  and  test  scores  (GCT  and  Cl).  Such  an  enlistee  would  then  be  accepted  if 
his  prohable  attrition  were  less  than  some  maximum  acceptable  level  specified  by 
higher  headquarters.  Otherwise,  he  would  be  rejected.  This  use  of  the  model  parallels 
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the  U.S.  N'aw  practice  of  computing  SCREEN*  scores.  For  the  sake  of  convenience, 
we  designate  the  output  of  each  model  as  the  individual's  CAREM^  score. 

Each  of  the  approximately  46,  000  individuals  in  the  data  base  has  a unique  CAR  EM 
score  because  each  has  a unique  set  of  personal  characteristics  and  test  scores.  These 
individuals  can,  therefore,  be  grouped  according  to  whether  the  models,  if  used  as  en- 
listment screening  devices,  would  have  admitted  or  rejected  them  for  a given  CAREM 
cutting  score.  Thev  can  further  be  subgrouped  by  whether  they  stayed  for  2 years 
following  enlistment  or  were  subject  to  attrition.  Tables  8 and  9 show  the  following: 

• Correct  predictions  by  the  models 

- Correct  acceptances:  The  number  of  persons  who  would  have  been 
accepted  by  the  model  and  who  actually  stayed  in  service  the  required 
2 years. 

- Correct  rejections:  The  number  of  persons  who  would  have  been 
rejected  by  the  model  and  who  actually  did  not  stay  for  2 years. 

• Incorrect  predictions  by  the  models 

- Incorrect  acceptances:  The  number  of  persons  who  would  have  been 
accepted  by  the  model,  but  who  did  not  stay  in  service  for  2 years. 

- Incorrect  rejections:  The  number  of  persons  who  would  have  been 
rejected  by  the  model,  but  who  actually  stayed  for  2 years. 

For  every  specified  value  of  the  CAREM  cutting  score,  the  disaggregated  model 
can  be  seen  to  be  superior  to  the  aggregated  model,  if  only  slightly  so.  For  example, 
at  a cutting  score  of  0.50,  the  aggregated  model  would  have  been  correct  68.2  percent 
of  the  time  had  it  been  applied  as  an  enlistment  criterion  to  the  individuals  in  the  data 
base  (see  table  9).  The  comparable  figure  for  the  disaggregated  model  is  slightly 
better  at  68.8  percent. 

The  ability  of  these  two  models  to  correctly  accept  and  reject  potential  enlistees 
can  be  compared  by  using  the  Chi-Square  test  illustrated  in  appendix  C.  The  null 
hypothesis  for  the  comparisons  is  that  the  number  of  applicants  accepted  and/or  re- 
jected is  independent  of  the  choice  of  models  --  i.e.,  that  they  are  essentially  identical. 


* SCREEN  is  an  acronym  for  Success  Chances  for  Recruits  Entering  the  Navy. 

2 - - _ 
CAREM  is  an  acronym  for  Chances  of  Attrition  for  Recruits  Entering  the 

Marines. 
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NUMBERS  OF  CORRECT  AND  INCORRECT  PREDICTIONS 


Ch i - square  fx  ):  3.80  5.34  41.0’  S4.88  41.06  22.66 


PERCENTAGE  OF  CORRECT  AND  INCORRECT  PREDICTIONS 


1 


The  Chi-Square  values  for  each  of  the  seven  CAREM  cutting  scores,  calculated  by  the  | 

procedures  of  appendix  C,  are  shown  at  the  bottom  of  table  8.  Since  the  critical  Chi- 

Square  value  (for  95-percent  confidence  and  3 degrees  of  freedom)  is  7.8,  the  null 

hypothesis  can  be  rejected  for  CAREM  cutting  scores  of  0.40  and  below  --  indicating 

that  the  models  produce  different  results  in  this  range.  For  scores  of  0.45  and  0.50, 

however,  the  results  do  not  appear  to  be  significantly  different  for  the  two  models. 

By  varying  the  cutting  scores,  one  can  simultaneously  affect  both  the  number  of 
people  the  screening  model  will  accept  and  the  attrition  probability  of  those  accepted. 

These  effects  are  shown  in  figures  1 and  2 and  are  combined  in  figure  3,  which  shows 
the  relationship  between  attrition  and  the  size  of  the  potential  recruit  pool.  These  figures 
demonstrate  that  by  reducing  the  cutting  score,  one  can  reduce  the  probability  of  attrition 
for  those  entering  the  service,  but  only  at  the  cost  of  a significant  reduction  in  the  number 
of  people  eligible  for  enlistment.  These  effects  can  best  be  quantified  by  the  elasticities 
of  eligibility  and  attrition  with  respect  to  cutting  score. 


FIG.  1:  EFFECT  OF  CUTTING  SCORES  ON  THE  SIZE  OF  THE  ELIGIBLE  POPULATION3 

a The  percentage  of  eligihles  is  measured  hv  the  proportion  of  total  acceptances  f both  correct  and  incorrect! 
by  the  screening  model. 


Elasticity  is  an  expression  of  the  amount  of  change  in  a dependent  variable  that 
results  from  a given  change  in  an  independent  variable  --  at  a given  value  of  the  in- 
dependent variable.  For  two  variables  related  by  the  expression  Y = f(X),  the  elastic- 
ity of  Y with  respect  to  X at  X = X^  may  be  defined  as: 


Y2  - Y! 


e = 


X2  - X1 

X. 


At  a cutting  score  of  0.45,  the  elasticity  of  attrition  is  2.3,  while  that  of  eligibility  is 
1.4.  As  a result,  for  a one-percent  reduction  in  cutting  score,  one  can  achieve  a 2.3- 
percent  reduction  in  attrition,  but  only  at  the  expense  of  a 1.4-percent  reduction  in 
eligibility.  It  should  also  be  noted  that  elasticities  are  not  constant  throughout  the  range 
of  the  variables.  They  apply  only  to  values  in  the  immediate  vicinity  of  the  point  at  which 
calculated.  The  cutting  score  of  0.45  was  deliberately  chosen  as  the  point  at  which  to 
calculate  the  elasticities  liecause  it  appears  to  be  in  the  vicinity  of  values  most  relevant 
to  realistic  and  effective  recruiting. 


Given  the  significant  impact  of  reductions  in  cutting  score  on  the  size  of  the  eligible 
population,  there  appears  to  be  a limit  to  the  amount  of  attrition  one  can  realistically 
hope  to  eliminate.  Since  cutting  scores  above  0.45  would  deny  enlistment  to  nearly  one 
out  of  every  three  potential  recruits  (figure  1),  reductions  below  this  point  are  undoubtedly 
inconsistent  with  successful  recruiting  efforts.  At  the  same  time,  it  is  undesirable  to 
allow  individual  attrition  prolabilities  greater  than  about  0,50.  We  are  therefore  restricted 
to  a practical  range  of  cutting  scores  from  0.40  to  0.50.  This  range  of  values  presents 
a dilemma,  however.  The  F-test  indicated  that  the  differences  between  the  two  models 
were  statistically  significant  for  all  values  of  the  cutting  score,  while  the  Chi-Square 
test  indicated  significance  only  for  values  less  than  0.45.  Given  this  apparent  contradic- 
tion, one  is  forced  to  conclude  that  these  standard  comparison  tests  do  little  more  than 
indicate  that  the  models  are  "almost,  but  not  quite"  different  (or  the  same)  in  terms  of 
what  they  predict. 


A common-sense  approach  to  this  dilemma  leads  one  to  compare  the  models  on 
the  hasis  of  their  performance  in  a hypothetical  recruiting  application.  Assuming  that 
the  average  annual  recruit  accession  goal  of  the  Marine  Corps  is  50,  000  recruits,  the 
number  of  mistakes  (incorrect  decisions)  that  would  be  made  by  each  model  is  shown 
in  table  10  for  two  realistic  values  of  a possible  GAREM  cutting  score. 
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TABLE  10 


ATTRITION  MODEL  PERFORMANCE 


Cutting  score 

Incorrect 

acceptances 

Incorrect 

rejections 

Total 


Aggregated 

(A) 

model 

Disaggregated  model 
(D) 

Di f f erence 
(A-D) 

0.50 

0.45 

0.50 

0.45 

0.50 

0.45 

12,641 

9,880 

12,504 

10,026 

137 

-146 

3,240 

6,366 

3,326 

6,186 

- 86 

180 

15,881 

16,246 

15,830 

16,212 

51 

34 

The  disaggregated  model  would  apparently  reject  86  more  potentially  successful 
Marines  than  would  the  aggregated  model  at  CAREM  = 0.50.  This  involves  an  inde- 
terminate, hut  very  real,  recruiting  cost.  At  the  same  CAREM  level,  however,  the 
disaggregated  model  would  also  accept  fewer  failures,  resulting  in  a savings  in  train- 
ing costs,  pay,  and  turbulence  (discharges,  etc.).  The  degree  to  which  these  potential 
costs  and  savings  would  offset  each  other  is  not  known;  but,  overall,  the  disaggregated 
model  would  make  approximately  50  fewer  "mistakes”  per  year  than  would  the  aggre- 
gated model.  Thus,  in  practical  terms,  the  difference  is  less  than  one  mistake  per 
week.  At  a cutting  score  of  CAREM  * 0.45,  the  exact  figures  are  slightly  different, 
hut  the  final  result  is  essentially  the  same.  It  is  not  considered  realistic  to  ascribe 
any  practical  significance  to  such  a small  difference. 


It  should  be  noted  at  this  point  that  establishing  the  cutting  score  at  CAREM  = 0.45 
does  not  imply  a net  attrition  rate  of  45  percent.  It  simply  eliminates  from  eligibility 
all  individuals  with  a probability  of  attrition  of0.45or  greater.  Eliminating  those  in- 
dividuals drastically  alters  the  attrition  probability  distribution  of  the  remaining  eligible 
population  so  that  the  expected  Corps-wide  attrition  would  actually  be  about  20  percent, 
as  illustrated  in  figure  2.  Such  an  attrition  rate  is  consistent  with  current  Marine  Corps 
goals. 
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IV.  SUMMARY  AND  CONCLUSIONS 


The  fact  that  high  school  graduates  perform  better  than  nongraduates  in  Marine  Corps 
service  has  been  amply  demonstrated  by  numerous  analyses.  In  fact,  the  education 
variable  so  dominates  every  existing  prediction  model  tliat  a real  danger  exists  that,  once 
its  influence  is  eliminated,  a new  and  different  set  of  predictor  variables  might  emerge. 

A separate  examination  of  high  school  graduates  and  nongraduates  has  produced  an  attri- 
tion model  that  differs  from  the  model  in  reference  1,  which  treats  graduates  and  non- 
graduates as  a single  aggregated  population.  The  two  models  contain  the  same  variables, 
but  the  relative  importance  of  the  variables  is  different.  The  two  models  therefore  pro- 
duce different  results  in  terms  of  the  attrition  they  predict.  They  also  produce  different 
results  when  applied  as  enlistment  screening  devices.  The  differences,  however,  have 
no  practical  significance.  The  relative  advantages  that  might  accrue  from  selecting 
one  model  over  the  other  are  negligible. 

It  is  therefore  concluded  that  there  is  no  advantage  to  be  gained  from  applying 
different  enlistment  eligibility  criteria  for  high  school  graduates  and  nongraduates. 

There  is  no  justification  for  using  the  relatively  more  complicated  dual  (disaggregated) 
model.  A model  patterned  after  that  in  reference  1 is  sufficient.  Although  these  con- 
clusions are  based  on  an  analysis  of  data  containing  ACB-61  scores,  there  is  no  reason 
to  suspect  that  they  would  change  if  the  analysis  were  applied  to  ASVAB  data.  The 
underlying  principles  are  the  same  in  either  case.  The  results  are,  in  fact,  consistent 
with  those  reported  in  reference  2,  which  is  based  on  a preliminary  examination  of 
ASVAB  data. 

With  regard  to  the  actual  use  of  an  enlistment  screening  (CAREM)  model  in  a recruiting 
application,  it  should  be  noted  that  such  use  could  allow  the  weaknesses  of  a potential 
recruit  to  be  partially  (or  even  completely)  offset  by  his  strong  points.  For  example,  a 
nongraduate,  17-year-old  applicant  with  a low  GCT  (say  70)  and  a Cl  score  of  120  has  a 
probability  of  attrition  of  approximately  41  percent.  Depending  on  the  prevailing  recrui- 
ting standards  anti  quotas,  this  applicant  may  be  rejected  (denied  enlistment)  because  of 
his  low  GCT.  On  the  other  hand,  a similar  applicant  with  a GCT  of  110  and  a Cl  of  60 
may  be  accepted  because  of  his  relatively  high  GCT.  This  second  applicant,  however, 
has  approximately  the  same  probability  of  attrition  as  the  first  and  should,  therefore, 
have  the  same  chance  of  acceptance.  While  such  a large  GCT-CI  divergence  is  unlikely, 
it  is  the  principle  tliat  should  be  considered,  not  the  exact  numbers.  The  fact  is  that 
tradeoffs  among  enlistment  screening  variables  is  possible  when  a CAREM-tvpe  regres- 
sion model  is  used.  Figure  4 illustrates  the  GC.T-CI  tradeoffs  for  nongraduates,  ages 
17  to  20,  for  two  values  of  predicted  attrition. 


Allowing  such  tradeoffs  among  the  variables  has  the  distinct  advantage  of  enlarging 
the  size  of  the  eligible  population  by  making  persons  eligible  who  otherwise  would  not  lie. 
In  a recruiting  environment  where  recruit  demand  exceeds  supply,  sucli  a step  should  fx- 
seriously  considered. 

Concerning  the  general  applicability  of  the  methodology:  There  appears  to  be  a 
danger  in  sanguinely  accepting  the  results  of  comparisons  based  on  F and  Chi-square 
tests.  The  results  of  such  tests  should  be  examined  carefully  for  their  applicability  and 
checked,  if  possible,  in  a practical  application  of  the  models  being  compared.  Only  then 
can  one  derive  a clear  understanding  of  the  nature  of  the  forecasts  being  made. 
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APPENDIX  A 


MEANS,  STANDARD  DEVIATIONS,  CONFIDENCE  INTERVALS, 

AND  CORRELATION  COEFFICIENTS 

This  appendix  contains  tables  that  display  the  means,  standard  deviations,  and 
confidence  intervals  for  observed  values  of  the  dependent  and  independent  variables 
It  also  contains  tables  that  show  the  coefficients  of  correlation  between  all  possible 
pairs  of  variables.  The  column  and  row  abbreviations  are  as  follows: 

Abbreviation  Variable 


VATTR 

Attrition  and/or  desertion  during  first  2 years 

ACE  21 

Age  at  enlistment  (17-20  or  over  21) 

RACE 

Race  (white  or  nonwhite) 

MAR  IT 

Marital  status  at  enlistment  (married  or  unmarried) 

GUAR 

Enlistment  guarantee  (none,  cash,  or  noncash) 

MCRD 

Marine  Corps  Recruit  Depot  (This  variable  was  in  the 
data  base  and  therefore  appears  in  the  table;  however 
it  was  not  used  in  the  analysis.) 

ACB-61  subtests: 

VE 

Verbal 

AR 

Arithmetic 

PA 

Pattern  Analysis 

Cl 

Classification  Inventory 

MA 

Mechanical  Aptitude 

ACS 

Army  Coding  Speed 

ARC 

Army  Radio  Code 

GIT 

General  Information 

SM 

Shop  Information 

AI 

Automotive  Information 

ELI 

Electronics  Information 

GCT 

General  Classification  Test 

A -1 


TABLE  A- 1 

HIGH  SCHOOL  GRADUATE  MEANS,  STANDARD  DEVIATIONS, 
AND  CONFIDENCE  INTERVALS 


Variable 

Mean 

Standard 

deviation 

95  -percent 
confidence  interval 

Attrition 

0.2029 

0 .4022 

0.1976 

- 

0.2082 

Age 

0.1249 

0.3307 

0.1206 

- 

0.1292 

Race 

1.2045 

0.4034 

1 .1992 

- 

1.2096 

Marital  status 

1 . 0647 

0.2532 

1.0614 

- 

1.0680 

Enlistment 

guarantee 

0.6391 

0.5825 

0.6315 

- 

0.6467 

ACB-61  scores: 

VE 

108.24 

22.22 

107.9487 

_ 

108 . 5313 

AR 

103.43 

22.26 

103.1382 

- 

103.7218 

PA 

110.96 

21.48 

110.6784 

- 

111.2416 

Cl 

102.49 

26.45 

102.1433 

- 

102 .8367 

MA 

104.34 

20.16 

104.0757 

- 

104.6043 

ACS 

103.00 

19.76 

102.7410 

- 

103.2590 

ARC 

90.06 

26.50 

89.7126 

- 

90.4074 

GIT 

99.41 

19.84 

99.1499 

- 

99.6701 

SM 

101.10 

18.90 

100.8524 

- 

101 . 3478 

AI 

102.46 

19.53 

102.2040 

- 

102.7160 

ELI 

97.47 

23.92 

97.1564 

- 

97.7836 

GCT 

107.54 

19.56 

107.2836 

- 

107.7964 

1The  precise  meaning  of  a confidence  interval  for  a dichotomous 
variable  is  unclear.  Values  are  shown  primarily  for  conven- 
ience and  consistency. 
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TABLE  A -2 


NONGRADUATE  MEANS,  STANDARD  DEVIATIONS, 
AND  CONFIDENCE  INTERVALS 


Standard  95-percent 


Variables 

Mean 

deviation 

confidence 

interval 

Attrit ion 

0.4469 

0.4972 

0 .4406 

- 

0.4532 

Age 

0.0781 

0.2683 

0.0742 

- 

0.0815 

Race 

1 . 2457 

0.4293 

1 .2382 

- 

1 . 2492 

Marital  status 

1 . 0763 

0.2738 

1 .0728 

- 

1 .0798 

Enlistment 

guarantee 

0.4310 

0.4965 

0.4275 

- 

0.4373 

ACB-61  scores: 

VE 

92.62 

20.98 

92 .5373 

- 

92 . 7027 

AR 

88.63 

19.96 

88.3755 

- 

88  .8847 

PA 

100.38 

22 . 00 

100.0993 

- 

100.6607 

Cl 

89.99 

27.23 

89.6425 

- 

90.3375 

MA 

94 . 56 

18.21 

94.3276 

- 

94 . 7924 

ACS 

92.47 

19.49 

92.2213 

- 

92.7187 

ARC 

77.89 

23.80 

77  . 5863 

- 

78.1937 

GIT 

87.43 

18.52 

87.1937 

- 

87.6663 

SM 

91.67 

17.97 

91.4407 

- 

91 . 8993 

AI 

95.75 

18.46 

95.5144 

- 

95.9856 

ELI 

88  . 07 

22.45 

87.7835 

- 

88 . 3565 

GCT 

93.88 

18.05 

93.6497 

- 

94.1105 
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APPENDIX  B 


REGRESSION  RESULTS 

This  appendix  contains  the  results  of  ten  separate  regressions:  one  for  each  of 
five  combinations  of  variables  for  both  high  school  graduates  and  nongraduates.  The 
tables  show  the  successive  values  of  the  R^  and  F statistics,  the  cumulative  R^  values, 
the  regression  coefficients  for  each  variable,  and  the  regression  constants.  The  con- 
stants appear  as  the  first  entry  in  the  "coefficient"  column  opposite  the  regression 
number. 


i 


TABLE  B-l 


Regression 

number 


US-1 


HS-2 


US- 3 


US-  4 


US-5 


REGRESSION  RESULTS:  HIGH  SCHOOL  GRADUATES 


Cumulative 


Variables 

AJT 

R2 

Coef  f i c i ent 

F 

PA 

0.03113 

0.03113 

0.6633 

-0.0013 

42.6 

AGE 

0.01038 

0.04151 

0.1256 

2 34  . 5 

Cl 

0.01088 

0.05239 

-0.0014 

121.2 

GUAR 

0.00393 

0.05632 

-0.0437 

84.3 

GIT 

0.00200 

0.05832 

-0.0017 

49.1 

VE 

0.00053 

0.05885 

0.0011 

26.4 

AR 

0.00094 

0 . 05980 

-0.0009 

17.9 

RACE 

0.00041 

0 . 06020 

-0.0251 

10.7 

ARC 

0 . 00043 

0.06063 

-0.0004 

8.4 

MAR  IT 

0.00029 

0.06092 

0.0277 

6.8 

ELI 

0.00017 

0.06109 

0.0004 

4 . 8 

SM 

0.00005 

0.06114 

-0.0002 

0.9 

ACS 

0.00000 

0.06115 

0.0000 

0.0 

AI 

0.00000 

0.06115 

0.0000 

0.0 

MA 

0.00000 

0.06115 

0.0000 

0.0 

GCT 

0.03355 

0.0335S 

0.6150 

-0.0025 

241.3 

AGE 

0.01162 

0.04517 

0.1341 

286.3 

Cl 

0.00695 

0.05212 

-0.0015 

163.8 

PA 

0.03113 

0.03113 

0.6774 

-0.0015 

62.9 

Cl 

0.00998 

0.04111 

-0.0014 

117.2 

GIT 

0.00215 

0.  04326 

-0.0015 

38 . 8 

ARC 

0.00054 

0. 04380 

-0.0004 

8 . 1 

VF. 

0.00061 

0.  04441 

0.0011 

25.3 

AR 

0.00082 

0. 04523 

-0.0009 

16.8 

ELI 

0.00016 

0. 04539 

0.0003 

4 . 3 

SM 

0.00011 

0.04550 

-0.0004 

2.2 

ACS 

0.00002 

0.04552 

-0.0001 

0 . 4 

AI 

0.00001 

0.04553 

0.0001 

0 . 3 

MA 

0.00000 

0.04553 

-0.0001 

0.1 

GCT 

0.03355 

0.03355 

0.6190 

-0.0022 

17  5.9 

Cl 

0.00642 

0.03997 

-0.0014 

139.4 

GUAR 

0.00431 

0.04428 

-0.0478 

100.7 

AGE  ' 

0.01281 

0.01281 

0.0606 

0.1224 

217.7 

RACE 

0.00697 

0.01977 

0.0838 

159.1 

MAR  I T 

0.00023 

0.02000 

0.0245 

5.2 

TABLE  B-2 


REGRESSION  RESULTS:  NONGRADUATES 


Regress  ion 
number 

Variables 

AR2 

Cumulative 

R2 

Coefficient 

F 

NG-1 

Cl 

0.03002 

0.03002 

0.9S10 

-0.0020 

171.7 

PA 

0.01154 

0.04155 

-0.0020 

95.3 

GUAR 

0.00235 

0.04391 

-0.0463 

45.4 

AGP 

0.00123 

0.04513 

0.0642 

27.3 

GIT 

0.00105 

0.04618 

-0.0015 

26.6 

RACE 

0.00061 

0.04679 

-0.0299 

13.2 

AR 

0.00053 

0.04732 

-0.0011 

19.6 

VK 

0.00034 

0.04766 

0.0007 

8.1 

MAR  IT 

0.00034 

0.04800 

0.0345 

8.3 

MA 

0.00025 

0.04825 

0.0007 

6.8 

SM 

0.00009 

0.04834 

-0.0005 

2 . 6 

ARC 

0.00009 

0.04843 

-0.0003 

2.8 

ACS 

0.00007 

0.04850 

0.0003 

2 . 0 

El. I 

0.00006 

0.04856 

0.0002 

1 . 5 

AI 

0.00000 

0.04856 

0.0000 

0.0 

NG-2 

GCT 

0.03386 

0.03386 

0.9333 

-0.0034 

261.7 

Cl 

0.00726 

0.04112 

-0.0019 

18  3.2 

AGE 

0.00248 

0.04360 

0.0714 

36 . 4 

NG-3 

Cl 

0.03002 

0.03002 

0.9517 

-0.0020 

17  2.5 

PA 

0.01154 

0.04155 

-0.0021 

106.0 

AR 

0.00145 

0.04301 

-0.0012 

24.6 

GIT 

0.00068 

0.04369 

-0.0014 

22.6 

MA 

0.00032 

0.04401 

0.0007 

5.8 

VF. 

0.00015 

0.04415 

0.0005 

4 . 5 

ARC 

0.00010 

0.04425 

-0.0003 

3.0 

ACS 

0.00007 

0.04432 

0.0003 

2.2 

SM 

0.00007 

0.04439 

-0.0005 

2.8 

ELI 

0.00006 

0.04446 

0.0002 

1 . 2 

AI 

0.00002 

0.04447 

0.0002 

0.5 

NC-4 

GCT 

0.03386 

0.03386 

0.9213 

-0.0031 

194.8 

Cl 

0.00726 

0.64112 

-0.0019 

17  3.8 

GUAR 

0.00173 

0.04285 

-0.0446 

42.7 

NG-S 

RACE 

0.00307 

0.00307 

0.3241 

0.0617 

66.6 

AGE 

0.00166 

0.00473 

0.0662 

2 8.2 

MAR  IT 

0.00041 

0.00514 

0.0380 

9.8 
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APPENDIX  C 

CHI-SQUARE  TEST  FOR  CONSISTENCY/INDEPENDENCE 


APPENDIX  C 


CHI-SQUARE  TEST  FOR  CONSISTENCY/INDEPENDENCE 

Suppose  a sample  of  size  N is  grouped  according  to  two  characteristics  (A  and  B) 
as  follows: 


Characteristic  A 1 

Characteristic 

2 

B 

k 

1 fll 

f12  • 

flk 

ml 

2 f 2 1 

f 2 2 

f2k 

m2 

3 f31 

f 32 

f3k 

m3 

r frl 

fr2 

frk 

m 

r 

nl 

n2 

• 

• 

nk 

N 

where  f = the  number  of  sample  members  (frequency)  in  the  cell  in  the  1 row  and  jl 
column, 


If  the  two  characteristics  are  independent,  the  expected  number  of  sample  members 
in  any  cell  ij  (i.e.,  the  expected  frequency)  is  given  by: 


e. . = 
ij 


m.n 


N 


The  observed  frequency  is  given  simply  by: 


o. . = f. . 

ij  iJ 

The  standard  expression  for  the  Chi-square  statistic  is: 


2 

X 


(C-l) 


(C-2) 


(C-3) 


where  o.  and  e.  are  the  observed  and  expected  frequencies  of  k possible  events.  This 
statement  (i.e.,  C-3)  is  equivalent  to: 


. 2 A 2 
k o . - 2o  . e . + e - 

£ - H — 1 

1=1  ei 


k /o? 


Z — * 2o.  + e. 
i = l\ei  1 V 


2 

X * 


k o . k 

Z r1  * 2 Z o. 

o 1 


i = l ei 


i = l 


k 

Z e • 
1=1  1 


k of 

Z -i  - 2N  + N 
i = l i 


(C-4) 


k 

K o . 


Z — - N. 


i = 1 ei 


C-2 


Substituting  from  equations  C-l  and  C-2  above,  we  now  have: 


2 

X 


and 


2 

X 


rk 

E 


i»j  = l 


f2. 

— . 

minj^ 


N 


m . n . 
i J 


(C-5) 


The  characteristics  used  in  this  test  need  not  be  numerical  groupings.  They  may  be 
such  things  as:  pass,  fail,  good,  average,  poor,  correct,  incorrect,  and  the  like.  For 
the  Chi-square  test  used  in  this  analysis,  characteristic  A indicates  the  performance  of 
the  model  (correct  acceptance,  etc.),  while  characteristic  B indicates  the  choice  of 
models.  Thus,  the  contingency  table  for  the  CAREM  = 0.50  Chi-square  calculation  is: 


Characteristic  A 

Characteristic  B 
Aggregated  Disaggregated 

Correct  acceptances 

27,422 

27,342 

54,764 

Incorrect  acceptances 

11,651 

11,525 

23,176 

Correct  rejections 

4,026 

4,152 

8,178 

Incorrect  rejections 

2,986 

3,066 

6,052 

46,085 

46,085 

92,170 

which,  using  equation  C-5,  leads  to  a Chi-square  value  of  3.80. 

Recall  that  during  the  derivation  of  the  Chi-square  formula,  an  assumptior  was  made 
that  ", . .if  the  two  characteristics  are  independent. ..."  The  results  may  now  be  used  to 
test  that  assumption.  The  null  hypothesis  for  the  test  is  that  characteristic  A is  indepen- 
dent of  characteristic  B--i.e.,  that  the  numbers  of  correct  acceptances,  etc.,  arc  indepen- 
dent of  the  choice  of  the  model.  If  the  calculated  value  of  the  Chi-square  statistic 
(equation  C-5)  exceeds  some  specified  critical  value  (from  standard  Chi-square  tables), 
then  the  null  hypothesis  may  be  rejected.  The  conclusion  then  would  be  that  the  models 
are  different.  In  the  case  of  the  models  examined  in  this  analysis,  the  ability  to  reject 
the  null  hypothesis  appears  to  be  dependent  upon  the  choice  of  the  CAREM  cutting  score, 
as  previously  reported. 
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